Skip to content

bitswap_unstable_stream added#3264

Open
michalkucharczyk wants to merge 18 commits into
mainfrom
mku-bitswap-stream
Open

bitswap_unstable_stream added#3264
michalkucharczyk wants to merge 18 commits into
mainfrom
mku-bitswap-stream

Conversation

@michalkucharczyk
Copy link
Copy Markdown
Contributor

@michalkucharczyk michalkucharczyk commented May 19, 2026

This PR adds bitswap_unstable_stream method defined in paritytech/json-rpc-interface-spec#186

It also improves the e2e test for bitswap_unstable_get which was added in #3240

@michalkucharczyk michalkucharczyk marked this pull request as draft May 19, 2026 13:18
@michalkucharczyk michalkucharczyk changed the title Mku bitswap stream bitswap_unstable_stream added May 19, 2026
format!("Bitswap stream subscription: {} cids", cids.len())
);

match me.bitswap_service.bitswap_stream(cids).await {
Copy link
Copy Markdown
Contributor

@lrubasze lrubasze May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm right but is client connection blocked while awaiting here?

Copy link
Copy Markdown
Contributor Author

@michalkucharczyk michalkucharczyk May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The await covers only the synchronous handoff (parse_and_dedup + a channel send + a oneshot for the BatchId), not the actual Have broadcast or per-CID events - those flow later via the events-pump task pushed to background_tasks at line 1220 (light-base/src/json_rpc_service/background.rs).

The only backpressure could be the bitswap service's bounded inbound channel being saturated - I think this shall not happen.

Any ideas how to solve it in different way?

@michalkucharczyk michalkucharczyk marked this pull request as ready for review May 29, 2026 15:45
@michalkucharczyk michalkucharczyk requested a review from skunert May 29, 2026 18:28
Copy link
Copy Markdown
Collaborator

@dmitry-markin dmitry-markin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks correct and carefully written.

One thing I would do differently though, is apply the fail-fast principle and return an error to the subscription request in case the Have request failed, instead of faning-out errors for all the individual CIDs. May be there is some justification for not doing so?

// One Cancel wantlist message containing all pending CIDs, sent to every peer this
// batch's Have broadcast reached. Cancel for an unknown CID is harmless on the receiver
// side — the peer no-ops.
let message = build_bitswap_cancel_message(pending_cids.iter());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note — substrate Bitswap server implementation never tracks pending want-lists (i.e., if it doesn't have the requested CID, it will just respond "NO" and never get back to it again). So, currently, it doesn't make sense to send cancel messages. While being mostly harmful (nit the bandwidth usage), we should probably:

  • document it here and write that the message is sent only for conformance with IPFS Bitswap protocol implementation in case it is later updated on substrate's side;
  • consider "subscribing" to not yet available chunks in substrate to handle cases when the publisher submitted the transaction, communicated to the receiver the CID, and the receiver need to await while it is available in the blocks. This is the edge case that needs to be confirmed before implementing in substrate to not introduce extra code requiring maintenance in case it is not needed by anyone.

Comment on lines +665 to +667
// Invariant: parse_and_dedup caps entries.len() at MAX_CIDS_PER_REQUEST
// and events_tx is bounded(MAX_CIDS_PER_REQUEST). At most `total` items
// are ever pushed for a batch, so the buffer cannot saturate.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth adding a note / TODO item here that not backpressuring network responses here leads to allocation of (64 * 2 MiB) = 128 MiB max, which is acceptable in a desktop environment, but can cause issue on mobile.

Comment on lines +184 to +189
/// Top-level errors: only the batch-input validation cases — `-32801 TooManyCids`,
/// `-32802 EmptyCids`, `-32803 DuplicateCids` — are surfaced at the top level. Wholesale
/// Have-broadcast failures (no peers connected / network send queue full) are NOT top-level
/// errors per the `bitswap_unstable_stream` spec: the subscription opens normally and the
/// failure fans out as one `streamItemError(-32812 FailRetryBackoff)` per remaining valid
/// CID, followed by `streamDone`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it was chosen to not fail fast the entire subscription in case the have request failed, delivering individual errors instead?

// total events ever pushed for this batch is bounded by entries.len(); the
// receiver is also held by BitswapStreamHandle so the channel cannot be Closed.
for (cid_str, err) in invalid_slots {
batch.pending_count -= 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for uniformity with other places

Suggested change
batch.pending_count -= 1;
batch.pending_count = batch.pending_count.saturating_sub(1);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants